Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

نویسندگان

چکیده

Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler the tremendous success modern RL. Among many asynchronous RL algorithms, arguably most popular effective one advantage actor-critic (A3C) algorithm. Although A3C becoming workhorse RL, its theoretical properties are still not well-understood, including non-asymptotic analysis performance gain parallelism (a.k.a. linear speedup). This paper revisits algorithm establishes convergence guarantees. Under both i.i.d. Markovian sampling, we establish local guarantee for in general policy approximation case global softmax parameterization. obtains sample complexity $\mathcal {O}(\epsilon ^{-2.5}/N)$ per worker to achieve notation="LaTeX">$\epsilon$ accuracy, where notation="LaTeX">$N$ number workers. Compared best-known ^{-2.5})$ two-timescale AC, achieves linear speedup , which justifies asynchrony AC theoretically first time. Numerical tests on synthetic environment, OpenAI Gym environments Atari games have been provided verify our analysis.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for o↵-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in o↵policy gradient temporal-di↵erence learning. O↵...

متن کامل

Asynchronous Advantage Actor- Critic with Adam Optimization and a Layer Normalized Recurrent Network

State-of-the-art deep reinforcement learning models rely on asynchronous training using multiple learner agents and their collective updates to a central neural network. In this thesis, one of the most recent asynchronous policy gradientbased reinforcement learning methods, i.e. asynchronous advantage actor-critic (A3C), will be examined as well as improved using prior research from the machine...

متن کامل

Towards Feature Selection In Actor-Critic Algorithms

Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a well-studied class of actor policies satisfy the known requirements for convergence when the actor features are ...

متن کامل

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal gangli...

متن کامل

Hierarchical Actor-Critic

The ability to learn at different resolutions in time may help overcome one of the main challenges in deep reinforcement learning — sample efficiency. Hierarchical agents that operate at different levels of temporal abstraction can learn tasks more quickly because they can divide the work of learning behaviors among multiple policies and can also explore the environment at a higher level. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Signal Processing

سال: 2023

ISSN: ['1053-587X', '1941-0476']

DOI: https://doi.org/10.1109/tsp.2023.3268475